具有隐式函数的单视RGB-D人重建通常以每点分类为例。具体而言,首先将相机视图中的一组3D位置投影到图像上,并随后针对每个3D位置提取相应的功能。然后,每个3D位置的特征用于独立分类,无论相应的3D点在观察到的对象内还是外部。此过程导致了亚最佳结果,因为仅通过提取的特征隐式地考虑了相邻位置的预测之间的相关性。为了获得更准确的结果,我们提出了占用平面(OPLANES)表示,该表示可以使单视RGB-D人类重建作为对平面上的占用预测,这些预测切成摄像机的视图。这种表示比体素电网提供了更大的灵活性,并使比每点分类更好地利用相关性。在具有挑战性的S3D数据上,我们观察一个基于Oplanes表示的简单分类器,以产生引人注目的结果,尤其是在由于其他对象和部分可见性引起的部分遮挡的困难情况下,这尚未通过先前的工作解决。
translated by 谷歌翻译
尽管从图像和视频数据中恢复几何形状在计算机视觉中受到了很多关注,但捕获给定几何形状纹理的方法不那么成熟。具体而言,纹理生成的经典方法通常假设干净的几何形状和合理的一致图像数据。尽管最近的方法,例如,对抗性纹理优化,更好地处理从手持设备获得的低质量数据,但我们发现它们仍然经常挣扎。为了提高鲁棒性,特别是最近的对抗性纹理优化,我们开发了明确的初始化和一个对齐程序。由于将几何形状绘制到纹理图和基于硬分配的初始化,因此它处理了复杂的几何形状。它通过将快速的图像对齐整合到纹理细化优化中来处理几何和图像的错位。我们在11个场景的数据集中证明了纹理生成的功效,总共有2807帧,观察7.8%和11.1%的感知和清晰度测量值相对改善。
translated by 谷歌翻译
真正需要什么才能使现有的2D GAN 3D了解?为了回答这个问题,我们会尽可能少地修改经典的gan,即styleganv2。我们发现只有两次修改是绝对必要的:1)一个多层图像样式生成器分支,该分支在其深度上产生一组Alpha地图;2)姿势条件歧视者。我们将生成的输出称为“生成多层图像”(GMPI),并强调其渲染不仅是高质量的,而且保证是持续的,这使GMPIS与许多先前的作品不同。重要的是,可以动态调整Alpha地图的数量,并且在训练和推理之间可能有所不同,减轻记忆问题,并在不到半天的时间内以1024^2美元的分辨率在不到半天的时间内快速训练GMPIS。我们的发现在三个具有挑战性和常见的高分辨率数据集(包括FFHQ,AFHQV2和METFACE)中是一致的。
translated by 谷歌翻译
深度立体声匹配近年来取得了重大进展。然而,最先进的方法基于昂贵的4D成本体积,这限制了它们在现实世界中的应用。要解决此问题,已经提出了3D相关映射和迭代差异更新。关于在现实世界平台中,如自动驾驶汽车和机器人,通常安装LIDAR。因此,我们进一步将稀疏的LIDAR点引入了迭代更新,这减轻了网络更新从零状态的差异的负担。此外,我们提出以自我监督的方式培训网络,以便可以在任何捕获的数据上培训,以获得更好的泛化能力。实验和比较表明,呈现的方法是有效的,并通过相关方法实现了可比的结果。
translated by 谷歌翻译
现有方法以非可分子点检测关键点,因此它们不能直接通过背部传播优化关键点的位置。为解决此问题,我们呈现了一个可差异的关键点检测模块,其输出精确的子像素键点。然后提出了再分断损耗直接优化这些子像素键点,并且呈现了分散峰值损耗以获得准确的关键点正则化。我们还以子像素方式提取描述符,并通过稳定的神经输注误差丢失训练。此外,轻量化网络被设计用于关键点检测和描述符提取,其可以在商业GPU上以每秒95帧运行为95帧。在同性记估计,相机姿态估计和视觉(重新)定位任务中,所提出的方法通过最先进的方法实现了相同的性能,而大大减少了推理时间。
translated by 谷歌翻译
我们介绍重做,一个类无话的框架来重建RGBD或校准视频的动态对象。与事先工作相比,我们的问题设置是更真实的,更具挑战性的三个原因:1)由于遮挡或相机设置,感兴趣的对象可能永远不会完全可见,但我们的目标是重建完整的形状; 2)我们的目标是处理不同的对象动态,包括刚性运动,非刚性运动和关节; 3)我们的目标是通过一个统一的框架重建不同类别的对象。为了解决这些挑战,我们开发了两种新模块。首先,我们介绍了一个规范的4D隐式功能,它是与聚合的时间视觉线索对齐的像素对齐。其次,我们开发了一个4D变换模块,它捕获对象动态以支持时间传播和聚合。我们研究了重做在综合性RGBD视频数据集风帆-VOS 3D和Deformingthings4d ++上的大量实验中的疗效,以及现实世界视频数据3DPW。我们发现重做优于最先进的动态重建方法。在消融研究中,我们验证每个发达的组件。
translated by 谷歌翻译
关键点匹配是多个图像相关应用的关键组件,例如图像拼接,视觉同时定位和映射(SLAM)等。基于手工制作和最近出现的深度学习的关键点匹配方法仅依赖于关键点和本地功能,同时在上述应用中丢失其他可用传感器(如惯性测量单元(IMU))的视觉。在本文中,我们证明IMU集成的运动估计可用于利用图像之间的关键点之前的空间分布。为此,提出了一种注意力制剂的概率视角,以自然地将空间分布集成到注意力图神经网络中。在空间分布的帮助下,可以减少用于建模隐藏特征的网络的努力。此外,我们为所提出的关键点匹配网络提出了一个投影损耗,它在匹配和未匹配的关键点之间提供了平滑的边缘。图像匹配在Visual Slam数据集上的实验表明了呈现的方法的有效性和效率。
translated by 谷歌翻译
The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
translated by 谷歌翻译
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequence as input and output some good results by fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic aspect of text (e.g., coherence) and sentence-level structures. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. Inspired by the distinctiveness and permanence properties of linguistic feature, we represent text as a coherence graph to capture its entity consistency, which is further encoded by the pretrained model and graph neural network. To tackle the challenges of data limitations, we employ a contrastive learning framework and propose an improved contrastive loss for making full use of hard negative samples in training stage. The experiment results on two public datasets prove our approach outperforms the state-of-art methods significantly.
translated by 谷歌翻译
Communication is supposed to improve multi-agent collaboration and overall performance in cooperative Multi-agent reinforcement learning (MARL). However, such improvements are prevalently limited in practice since most existing communication schemes ignore communication overheads (e.g., communication delays). In this paper, we demonstrate that ignoring communication delays has detrimental effects on collaborations, especially in delay-sensitive tasks such as autonomous driving. To mitigate this impact, we design a delay-aware multi-agent communication model (DACOM) to adapt communication to delays. Specifically, DACOM introduces a component, TimeNet, that is responsible for adjusting the waiting time of an agent to receive messages from other agents such that the uncertainty associated with delay can be addressed. Our experiments reveal that DACOM has a non-negligible performance improvement over other mechanisms by making a better trade-off between the benefits of communication and the costs of waiting for messages.
translated by 谷歌翻译